214 research outputs found
Learning Neural Implicit through Volume Rendering with Attentive Depth Fusion Priors
Learning neural implicit representations has achieved remarkable performance
in 3D reconstruction from multi-view images. Current methods use volume
rendering to render implicit representations into either RGB or depth images
that are supervised by multi-view ground truth. However, rendering a view each
time suffers from incomplete depth at holes and unawareness of occluded
structures from the depth supervision, which severely affects the accuracy of
geometry inference via volume rendering. To resolve this issue, we propose to
learn neural implicit representations from multi-view RGBD images through
volume rendering with an attentive depth fusion prior. Our prior allows neural
networks to perceive coarse 3D structures from the Truncated Signed Distance
Function (TSDF) fused from all depth images available for rendering. The TSDF
enables accessing the missing depth at holes on one depth image and the
occluded parts that are invisible from the current view. By introducing a novel
attention mechanism, we allow neural networks to directly use the depth fusion
prior with the inferred occupancy as the learned implicit function. Our
attention mechanism works with either a one-time fused TSDF that represents a
whole scene or an incrementally fused TSDF that represents a partial scene in
the context of Simultaneous Localization and Mapping (SLAM). Our evaluations on
widely used benchmarks including synthetic and real-world scans show our
superiority over the latest neural implicit methods. Project page:
https://machineperceptionlab.github.io/Attentive_DF_Prior/Comment: NeurIPS 202
3D Shape Completion with Multi-view Consistent Inference
3D shape completion is important to enable machines to perceive the complete
geometry of objects from partial observations. To address this problem,
view-based methods have been presented. These methods represent shapes as
multiple depth images, which can be back-projected to yield corresponding 3D
point clouds, and they perform shape completion by learning to complete each
depth image using neural networks. While view-based methods lead to
state-of-the-art results, they currently do not enforce geometric consistency
among the completed views during the inference stage. To resolve this issue, we
propose a multi-view consistent inference technique for 3D shape completion,
which we express as an energy minimization problem including a data term and a
regularization term. We formulate the regularization term as a consistency loss
that encourages geometric consistency among multiple views, while the data term
guarantees that the optimized views do not drift away too much from a learned
shape descriptor. Experimental results demonstrate that our method completes
shapes more accurately than previous techniques.Comment: Accepted to AAAI 2020 as oral presentatio
Coordinate Quantized Neural Implicit Representations for Multi-view Reconstruction
In recent years, huge progress has been made on learning neural implicit
representations from multi-view images for 3D reconstruction. As an additional
input complementing coordinates, using sinusoidal functions as positional
encodings plays a key role in revealing high frequency details with
coordinate-based neural networks. However, high frequency positional encodings
make the optimization unstable, which results in noisy reconstructions and
artifacts in empty space. To resolve this issue in a general sense, we
introduce to learn neural implicit representations with quantized coordinates,
which reduces the uncertainty and ambiguity in the field during optimization.
Instead of continuous coordinates, we discretize continuous coordinates into
discrete coordinates using nearest interpolation among quantized coordinates
which are obtained by discretizing the field in an extremely high resolution.
We use discrete coordinates and their positional encodings to learn implicit
functions through volume rendering. This significantly reduces the variations
in the sample space, and triggers more multi-view consistency constraints on
intersections of rays from different views, which enables to infer implicit
function in a more effective way. Our quantized coordinates do not bring any
computational burden, and can seamlessly work upon the latest methods. Our
evaluations under the widely used benchmarks show our superiority over the
state-of-the-art. Our code is available at
https://github.com/MachinePerceptionLab/CQ-NIR.Comment: to be appeared at ICCV 202
Learning Signed Distance Functions from Noisy 3D Point Clouds via Noise to Noise Mapping
Learning signed distance functions (SDFs) from 3D point clouds is an
important task in 3D computer vision. However, without ground truth signed
distances, point normals or clean point clouds, current methods still struggle
from learning SDFs from noisy point clouds. To overcome this challenge, we
propose to learn SDFs via a noise to noise mapping, which does not require any
clean point cloud or ground truth supervision for training. Our novelty lies in
the noise to noise mapping which can infer a highly accurate SDF of a single
object or scene from its multiple or even single noisy point cloud
observations. Our novel learning manner is supported by modern Lidar systems
which capture multiple noisy observations per second. We achieve this by a
novel loss which enables statistical reasoning on point clouds and maintains
geometric consistency although point clouds are irregular, unordered and have
no point correspondence among noisy observations. Our evaluation under the
widely used benchmarks demonstrates our superiority over the state-of-the-art
methods in surface reconstruction, point cloud denoising and upsampling. Our
code, data, and pre-trained models are available at
https://github.com/mabaorui/Noise2NoiseMapping/Comment: To appear at ICML2023. Code and data are available at
https://github.com/mabaorui/Noise2NoiseMapping
Latent Partition Implicit with Surface Codes for 3D Representation
Deep implicit functions have shown remarkable shape modeling ability in
various 3D computer vision tasks. One drawback is that it is hard for them to
represent a 3D shape as multiple parts. Current solutions learn various
primitives and blend the primitives directly in the spatial space, which still
struggle to approximate the 3D shape accurately. To resolve this problem, we
introduce a novel implicit representation to represent a single 3D shape as a
set of parts in the latent space, towards both highly accurate and plausibly
interpretable shape modeling. Our insight here is that both the part learning
and the part blending can be conducted much easier in the latent space than in
the spatial space. We name our method Latent Partition Implicit (LPI), because
of its ability of casting the global shape modeling into multiple local part
modeling, which partitions the global shape unity. LPI represents a shape as
Signed Distance Functions (SDFs) using surface codes. Each surface code is a
latent code representing a part whose center is on the surface, which enables
us to flexibly employ intrinsic attributes of shapes or additional surface
properties. Eventually, LPI can reconstruct both the shape and the parts on the
shape, both of which are plausible meshes. LPI is a multi-level representation,
which can partition a shape into different numbers of parts after training. LPI
can be learned without ground truth signed distances, point normals or any
supervision for part partition. LPI outperforms the latest methods under the
widely used benchmarks in terms of reconstruction accuracy and modeling
interpretability. Our code, data and models are available at
https://github.com/chenchao15/LPI.Comment: 20pages,14figures. Accepted by ECCV 202
Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network
Exploring contextual information in the local region is important for shape
understanding and analysis. Existing studies often employ hand-crafted or
explicit ways to encode contextual information of local regions. However, it is
hard to capture fine-grained contextual information in hand-crafted or explicit
manners, such as the correlation between different areas in a local region,
which limits the discriminative ability of learned features. To resolve this
issue, we propose a novel deep learning model for 3D point clouds, named
Point2Sequence, to learn 3D shape features by capturing fine-grained contextual
information in a novel implicit way. Point2Sequence employs a novel sequence
learning model for point clouds to capture the correlations by aggregating
multi-scale areas of each local region with attention. Specifically,
Point2Sequence first learns the feature of each area scale in a local region.
Then, it captures the correlation between area scales in the process of
aggregating all area scales using a recurrent neural network (RNN) based
encoder-decoder structure, where an attention mechanism is proposed to
highlight the importance of different area scales. Experimental results show
that Point2Sequence achieves state-of-the-art performance in shape
classification and segmentation tasks.Comment: To be published in AAAI 201
- …